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Cognitive Limits on Simultaneous Control of Multiple 
Unmanned Spacecraft 


Summary 


Space exploration 40 years into the future may include manned missions to parts of the 
outer solar system. A possible scenario may include sending a small fleet of craft with 
different primary missions. For example, a trailing spacecraft of nuclear powered 
electromagnets designed to shield the manned part of the fleet from solar radiation; 
halo spacecraft with powerful radars to scout for incoming objects; exploration and 
mining craft, etc. The fleet could regularly travel out of unaided visual range of each 
other, joining up when necessary for maintenance, exchange of materials such as fuel, 
or other necessities. Piloting these multiple craft could be economically accomplished if 
only one remote pilot on station at a time was necessary. 


The cognitive limitation of a human astronaut and his ability to perform the multiple- 
vehicle piloting task is the focus of this paper as little work has been done in this 
Specific area. However, a large body of cognitive research on the limitations of object 
supervision and tracking for the task of air traffic control (ATC) exists. Additionally, 
there is an emerging body of research concerned with multiple unmanned vehicle 
piloting for heterogeneous missions. These are the two areas reviewed in detail as they 
relate to possible spacecraft missions. 


Pilots develop an internal mental representation of the identity, position, mission, and 
current direction of relevant objects. This is referred to colloquially as “the big picture.” 
The primary research question we seek to answer is whether there is a cognitive limit 
to the number of objects that can be monitored and tracked within the big picture. 
Secondarily, we seek to find whether this maximum number is limited by the 
complexity of interaction; how those limiting factors are described, whether there is a 
real-time objective measure that indicates when a pilot is approaching his maximum 
capacity, and whether that capacity has been exceeded. The maximum number of 
tracked objects is highly dependent on the complexity of the piloting and mission tasks 
at hand. 


Research is lacking in the area of cognitive limits on the number of spacecraft one pilot 
could control given any mission scenario. Currently, two models are being used to 
examine similar activities in air traffic control and remote piloting of multiple unmanned 
vehicles. In both areas it has been shown the cognitive limits on the number of craft 
capable of simultaneous control is 16 for simple destination selection, 7 for moderately 
complex piloting and/or mission task completion, and 4 for complex heterogeneous 
craft. While additional future research may help to increase the automation component 
of aircraft and mission control, no current evidence exists to show that a complete 
mental picture can be maintained for more than about 16 objects at one time, even 
with external working memory augmentation. However, it has also been demonstrated 
that physiological variables can be objectively employed to indicate overload. Nominal 
success has been achieved in classifying physiological states near high workload thus 
enabling both prediction and possibly prevention of overload. 
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Chapter 1: Introduction 


Due to the complexity, duration and numerous support requirements of future manned 
deep-space missions involving exploration, mineral exploitation, and possible 
colonization, a likely scenario will be the inclusion of unmanned fleets of support craft. 
Coupled with other requirements, an intensive research program is needed to 
investigate the cognitive limits on pilots and other operators responsible for the 
simultaneous control of multiple unmanned spacecraft making up the support fleet; a 
“fleet’ approach is proposed in an effort to optimize safety and exploratory reach. This 
research effort would also aim at maximizing the functional efficiency of the mission 
and reducing the operation costs of unmanned vehicle fleets. 


In this scenario there is much about the ancillary craft that are automated in both 
navigation and mission. Many of them will not require full-time piloting but given that 
they could be hundreds of miles from each other at any instant of time, they need 
monitoring to prevent unseen system failure or collision from letting them just 
disappear one day during the mission like a Martian probe. Accomplishing this 
monitoring task and the occasional piloting task for multiple craft in the fleet could be 
economically accomplished if only one remote pilot on station at a time was necessary. 


We will focus here on the cognitive limitation of a human astronaut to perform the 
multiple-vehicle piloting task. It is not a surprise that there is little work in this specific 
area — in fact there were zero peer-reviewed articles in the major journals concerning 
remote piloting of multiple spacecraft (published in the last 30 years). There is 
however, a large body of cognitive research on the limitations of object supervision and 
tracking for the task of air traffic control (ATC). There is additionally an emerging body 
of research concerned with multiple unmanned vehicle piloting for heterogeneous 
missions. These are the two areas reviewed in detail as they relate to possible 
spacecraft missions. 


When a pilot or ATC operator is in control of several craft they have developed an 
internal mental representation of the identity, position, mission, and current direction of 
each object tracked. This is referred to colloquially as “the big picture.” This mental 
representation is also called situational awareness. Keeping all of the information about 
all of the objects straight as long as they are in scope is the goal. 


The primary research question we seek to answer is whether there is a cognitive limit 
to the number of moving objects that can be maintained in the big picture. Secondary 
questions are whether this maximum number is limited by complexity, how those 
limitations might be described, whether there is a real-time objective measure that will 
indicate when a pilot is approaching their maximum capacity, and whether that capacity 
has been overloaded. 


For the current treatise we will consider only traditional humans as pilots. Cyborg- 
enhanced astrobots are a topic for another tome. 


In discussing cognitive limitations, it is useful to introduce the concepts of task demand, 
mental workload, and a simplistic model of multiple resource theory. For a given task, 
the gross level of neural activity required is a representation of the mental demands of 
the task. Of course, a complex task can demand varied resources such as visual and 
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audio processing, and this is where multiple resource theory enters: different resources 
can be considered independent if demands on one resource do not tax the availability of 
capacity of the other. Performance of a task can be high or low: in general if spare 
capacity is available, performance is high, and if capacity is limited or exceeded, 
performance is low. Moreover, task demand enforces a fixed theoretical relationship 
between mental workload and task performance: this is shown in Figure 1. 


Relationship between Workload and Task Performance 


High 


Performance 


Task Demand ——————~ 


Figure 1. One-dimensional Representation of Changes in Performance 
as Workload Varies. Task demand increases to the right. In the three A 
regions, performance remains unchanged: A2 is optimal, where a_ trained 
operator exerts minimal effort to maintain a set level of performance. At times 
of Jow demand the operator exerts effort to maintain vigilance (A1) until 
demand is so low, the subject disengages from the task (D). The workload curve 
is not well defined at very low levels of demand (dotted lines). At the other end, 
as demand increases, the subject can exert effort to keep up with demand (A3). 
Additional effort maintains performance until degradation begins (B), and in the 
overload condition, region C, performance is degraded beyond acceptable levels. 
While effort is maintained in overload, some low level of performance exists. 
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Chapter 2: Measurement of Mental Workload 


There are three approaches to measuring mental workload in a subject performing a 
primary task. The first is subjective evaluation, either post-hoc self-report 
questionnaires concentrating on how “busy” one may have felt, or an experimental 
observation of activity. The second is performance measures, an objective evaluation of 
how well the subject completes the primary task, a secondary task, or an 
experimentally inserted reference task. The final approach is to record real-time 
physiological measures, with the assumption that increased workload increases anxiety 
and this will be exhibited by changes in the autonomic nervous system (ANS). 


SUBJECTIVE MEASUREMENTS 


Self-report measures are appealing because they get inside the mind that was 
performing the task. There is no absolute objective scale to measure one person’s “fully 
occupied” from another person’s view of the same state; however, through rating 
scales and self-drawn graphs, one can obtain an accurate picture of how perceived 
workload evolved during the experiment. Perceived workload is important because it is 
what needs to be maintained between a subjective minimum, where attention may 
wander, and a subjective maximum, where increased emotion may decrease 
performance capacity. 


The most frequently used standardized self-report tools are the NASA Task Load Index 
(NASA-TLX or just TLX) and the Subjective Workload Assessment Technique 
(SWAT).1:2:3.4 The TLX is a subjective workload assessment based on a multi- 
dimensional rating questionnaire. An overall workload score is derived based on a 
weighted average of ratings on six subscales: mental demands, physical demands, 
temporal demands, own performance, effort, and frustration. SWAT is a two-step 
assessment of three workload factors: time load, mental effort load, and psychological 
stress load. In the first step, hypothetical activities are ranked according to perceived 
workload. In the second step, the experimental task is evaluated post-hoc, using a 1-3 
rating scale for each of the three dimensions. An interval scale of workload is derived, 
fram 0-160, based on the reference data collected for each subject in the first step, and 
the evaluation of the experimental task. A custom self-report can also be designed by 
the experimenter to specifically focus on the research questions in a given experiment. 


The second type of subjective measure is evaluation by an expert observer. In this type 
of measurement, assumptions are made on the mental activity of the subject based on 
the activities being performed. The advantage of this approach is that there is minimal 
variance per-to-person if the same evaluator is employed. The disadvantage of this 
approach is the outside observer can miss workload multiplying factors such as task 
complexity contributing to an overwhelmed feeling by the subject. An example of this 
approach is the concept of utilization. In calculating utilization it is assumed that the 
subject must cognitively address one issue at a time in serial order. The measure of 
utilization is percent time busy, or addressing any issue, as opposed to waiting or 
monitoring for the next event. In general is it observed for control and supervisory 
tasks that at around 70% utilization performance begins to degrade.> Arguably not a 
perfect measure of workload, utilization has the advantage of simplicity, objectivity, 
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and a quantitative scale, allowing it to be used in threshold detection and prediction, as 
well as in computer-assisted workload balancing. 


PERFORMANCE MEASURES 


Laboratory, reaction time to an event stimulus is a typical real-time measure of speed. 
Outside the laboratory, in a more natural environment, speed is an indirect measure of 
a subject's ability to keep up with a given rate of events.? Accuracy is measured in 
both the laboratory and naturalistic environments: for example, in an air-traffic control 
task, properly handing off a plane to the next controller, as it leaves the first 
controller's monitored airspace, is measured as a successful task completion. 


Performance, when utilizing a secondary task, is accomplished following two paradigms. 
First, in the dual-task paradigm, performance on the secondary task is required and 
primary task performance is thus an indication of workload. For the second paradigm, 
instruction is given to maintain the primary task performance, and performance on the 
secondary task is thus a measure of “space capacity” for additional workload. 


Care is required in selection of secondary tasks in order to ensure they affect the 
resources one wishes to probe. For example, in a driving scenario, a minimally 
intrusive task is to push a button on the floor or steering wheel when a light flashes in 
the field-of-view of the driver. This visual "detect-and-respond" task adds to what is 
primarily a complex visual and motor coordination task that probes the capacity of 
visual attention.® This is different than a secondary task utilizing different resources, 
such as having a conversation while driving. When planning the paradigm, a model 
needs to be developed that treats (or explicitly ignores) individual tasks and 
interactions among them. 


Reference tasks are executed before and after primary tasks. Typically reference tasks 
focus on trending in performance due to effects such as fatigue. One important case of 
reference tasks is to normalize an individual’s current capacity for mental workload. 
Such a pre-performance measure can be used to adjust maximum workload to 
compensate for day-to-day variation. 


PHYSIOLOGICAL MEASURES 


The human nervous system is anatomically divided into the Central Nervous System 
(CNS) — and the Peripheral Nervous System (PNS). The CNS includes the brain and the 
spinal cord. The PNS is made up of the somatic division, which innervates the skin, 
voluntary muscle, and joints, and the autonomic division, which mediates visceral 
sensation as well as executes motor control of smooth muscle, viscera, and endocrine 
glands. The autonomic division consists of sympathetic, parasympathetic, and enteric 
systems. The sympathetic system mediates response to stress, while the 
parasympathetic system works to maintain homeostasis and conserve body resources. 


* Reaction time is not typically measured In real-time in the field as events occur at unplanned times. Post-hoc 
analysis can resolve reaction times down to a comparable resolution to innate motor reaction variance, on the 
order or 10's of msec. 

® Contrary to what a layperson may think, the vast majority of errors in air traffic control do not result in collisians 
or even near misses; rather they are mistakes in procedure. 
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The enteric systern executes control of smooth muscle. Although the CNS and PNS are 

anatomically distinct, they are functionally intertwined. When discussing the function of 
the autonomic division, it is customary to refer to it as the Autonomic Nervous System, 
or ANS,’ 


Changes in global arousal or activation through changing workload can result in 
changes in physiological activity. These measures are advantageous as changes are 
measurable continuously, in real-time, and usually unobtrusively in a naturalistic 
setting. The drawback of using physiology alone is that there is no direct measure of 
primary task performance. 


Cardiac Function 


Normal beating of the human heart produces a distinct and repeating pattern of 
electrical activity measurable throughout the body for any two sample points that cross 
the chest.* The typical electrocardiogram signal is shown in Figure 2. 


Figure 2. Typical EKG Signal for a Normal Heartbeat. Portions of the waveform are labeled P, Q, 
R, S, and T. 


Detection of the R-wave allows measurement of frequency, time,? and amplitude. For 
continuous monitoring, heart rate measurements will vary considerably and in a non- 
linear fashion; therefore, the measurement of inter-beat-interval (IBI), the time 
between R peaks, is more normally distributed in the absence of signal, reducing noise 
in the measurement.® Averaging the heart rate over minutes of task performance and 
comparing to baseline yields a reliable estimate of increased metabolic function.? An 
additional measure is the heart rate variability (HRV), calculated by dividing the 
standard deviation of IBI by an average value of IBI within a sample period. Additional 
measurements can be made by decomposing the spectra of HRV into low, mid, and 


“In fact, a typical introductory physics course for life science students will include a laboratory experiment where 
the heart rhythm is measured between electrodes located on the right wrist and left ankle. 
4 This time measurement is properly a phase measurement, with the phase relative to some reference event. 
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high frequency components which have different noise contributions (for example, core 
temperature changes, blood pressure or speech, and respiration, respectively).? 


Blood pressure and its variability can also be measured. For continuous monitoring, a 
finger cuff filled with water match the inner-arterial pressure and can be used to 
monitor variability.12 


CNS Measurements 


Measurement of brain activity can be unobtrusively recorded using low-count 
electroencephalography (EEG),* or the minimally obtrusive techniques of EEG, near- 
infrared spectrometry (NIRS), trans-cranial Doppler sonography (TCDS), or the non- 
invasive laboratory techniques of functional MRI (fMRI), magnetoencephalography 
(MEG), or positron emission tomography (PET). The latter three techniques are for 
brain research only and do not have any current naturalistic research studies (although 
see Genik!* for a prognosis on generation-after-next technologies including NIRS, PET, 
fMRI, and MEG). 


EEG measurements are typically divided into spectra and the relative power in the 
bands 0-4 Hz (A), 4-8 Hz (6), 8-13 Hz (a), 13-30 Hz (B), and 30-100 Hz (vy). NIRS 
measures the BOLD effect! and is related to localized y activity. TCDS measures CO? as 
the byproduct of localized increased metabolism and is also an indirect measure of 
neural activity which has been shown to be related to vigilance.'? For EEG experiments 
in mental workload, changes are typically reported in the 6 and « bands, though more 
recently B and y bands have shown sensitivity.14 


Event-related potentials (ERP) are peaks of activity measured at the skin indicative of 
several tens of thousands of neurons firing coherently for a short time. For example, a 
well-studied cognitive ERP is P300, a 10’s-of-msec-wide bump in the EEG signal 
occurring 200-400 msec after an event. Some success in utilizing a task-irrelevant 
secondary audio stimulation and N100 has been shown,?° but little further development 
to this approach has been found in the last 15 years. 


Ocular Measurements 


Measurements involving eye fixations, dwell time (temporal length of a fixation), and 
pupillary changes are well established metrics of workload in visual searching tasks.!® 
Additional measures of ocular changes include blink rate, blink duration, blink latency, 
and eye movement. These are recorded using one of various types of eye-tracking (ET) 
or electrodes to measure an electrooculogram (EOG). ET data will include position of a 
fixation and the time of each eye movement (or saccade), whereas an EOG only 
identifies the time that the muscle controlling eye blinks or eye position activated. 


* Low-count EEG is generally less than 10 electrodes, and usually 3 or 5. 
‘The Blood Oxygen Level Dependent (BOLD) effect is a local change in the oxygen saturation ratio near neural 
activity due to metabolic and vascular action. This change is detectible in the infrared spectra. 
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Skin Measurements 


The ANS controls the opening and closing of sweat glands in response to stress and 
anxiety, These changes can be observed by measuring electrodermal activity, typically 
skin conductance response (SCR),17 but also skin potentials (SP) and skin temperature 
(ST). A standard score of these measures can be calculated by collecting resting state 
data prior to the task interval. 


Serum Levels of Hormones 


Hormone levels are direct results of activity in the ANS. It is natural to want to 
continuously monitor certain stress hormones such as adrenaline or cortisol. The 
difficulty lies in measurement time — typical fast assays of salivary cortisol still require 
about 15 minutes.’ Until reliable in-situ monitoring is available, serum levels of 
hormones will only play a supporting role in establishing baseline capacities or post- 
incident analysis. 
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Chapter 3: Studies in Cognitive Workload for Air Traffic 
Controllers 


There are a few areas of research applicable as analogues to the piloting of several 
aircraft. The largest of these areas analyzes the supervisory functions of air traffic 
controllers (ATC). In a typical ATC duty environment, a single operator is responsible 
for many fully-autonomous aircraft. ATC is either airport-based or en-route. Airport- 
based facilities include ground control, local control for active runways (Tower control), 
and approach control, which in the US is called Terminal Radar Approach Control, or 
TRACON. A zone of air-controlled space assigned to a specific control center is that 
center’s sector; similarly, a given controller is responsible for a sector of airspace. 
Between airport-controlled sectors, aircraft are monitored by an en-route facility. 
TRACON is the most cognitively demanding function in this chain. Terminal controllers, 
as ATCs are called at TRACON facilities, are responsible for departures after take-off, 
and approaches and flyovers within about a 50-mile radius of the airport.9 Approaching 
aircraft need to be vectored by the controller into an appropriate flight path for landing, 
avoiding all other aircraft in the air or soon to be in the air, and then handed off to the 
Tower controller for landing and ground instruction. Departing flights and flyover traffic 
mainly need to be monitored for conflicts. Approaching aircraft are by far the most 
cognitively challenging in the ATC task. Images of air traffic controller environment and 
displays are shown in Figure 3." 


Errors in flight control are called anomalies in the industry. The most common aircraft 
anomaly is deviation from flightpath in en-route sectors. This anomaly is corrected by 
pilots themselves or after instruction from the appropriate ATC. The most common ATC 
error is miscommunication between controllers in an aircraft handoff between sectors. 1° 
A study in 1997 by the National Academies showed that ATC errors occurred in both 
high and low workload conditions, as predicted by overload and disengagement. 2° 


In looking at the cognitive limits in ATC, we should seek where typical controllers enter 
the B region of workload. Traffic load defined as simply the number of aircraft does not 
by itself show the complete picture of ATC workload. In a study of professional 
controllers in 2006, Boag recorded subjective measures of workload and reaction time 
when static air traffic displays were presented. Displays included air traffic conflicts of 
differing complexity that required resolution. Complexity in the display was objectively 
ranked using the Method of Analysis of Relational Complexity (MARC).7! Results showed 
that a relatively small number of aircraft can greatly increase the perceived workload. 
Boag concludes that perceived complexity is the number one factor in determining 
workload, and that although conflicts are the major source of complexity in the ATC 
task, and these can be modeled somewhat using a combination of aircraft separation 
and transition! variables, individual differences still have significant impact on when a 
controller may reach overload. 


* Each facility will vary in TRACON sector radius. For smaller airports, TRACON functions may be performed by a 
nearby en-route facility. 

" Of historical note in the US Air traffic Control industry was the industry-wide strike begun on August 3, 1981, by 
the nearly 13,000 ATC specialists. Only 1300 obeyed a Presidential order to return to work under the “peril to 
national safety” provision of the 1947 Taft-Hartley Act. Gn August 5, 1981 the remaining 11,345 controllers were 
fired and banned for life from federal service. The FAA rebuilt the force to pre-strike levels during the rest of the 
1980s. This event resulted in a dearth of research during the 1980s on air traffic contral professionals. 

‘An aircraft transition is an event such as landing, hand-off to another controller, or entering/leaving a sector, 
where a sector is defined as a controller's airspace of responsibility. 
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In a post-hoc examination of sources of human error in taped controller data, Chang 
developed a conceptual model of ATC task within a system of human, software, 
hardware, and environment using the SHEL?? approach of ergonomics. This study also 
examined several aspects of personnel management, but when overload was examined 
as the source of error, it was shown that all factors need to be considered along with 
their interaction. That is, the human and the external task cannot be considered in 
isolation of the environment of the other humans, and the augmentation capabilities of 
software, hardware, and the performance environment significantly affect error rates.24 


Operator overload can result in subsequent errors if sufficient recovery time is not 
allotted. Di Nocera examined specific error rates after short periods of overload. The 
first part of the study successfully proved that these so-called post-completion errors 
existed in times immediately after peaks in workload. The second part of the study 
examined whether an augmentation tool to assist in medium term conflict detection 
could reduce these errors. The subject population included 18 military ATC of varying 
experience. Workload was based on self-reported NASA-TLX. Results showed that 
augmentation processes helped junior controllers, but senior controllers were 
unaffected by the assistance.25 
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Figure 3. Air Traffic Control: a) overview of a large TRACON facility. Potomac is pictured. b) A 
single ATC TRACON station. Beside the radar display is a stack of memory ald blocks when the 
controller writes essential aircraft information — as long as there is a physical block to the right, 
there should be a corresponding “blip” on the screen. c) TRACON display from Minneapolis-St. Paul. 
On this display are aircraft under control (undesignated), as well as craft from 3 other controllers, 
East (E), North (N), and Ground (G). Below each aircraft name is controller designation, altitude in 
hundreds of feet, and airspeed in tens of knots. 
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Lamoureaux researched in 1999 whether a proposed ATC augmentation system could 
Safely allow smaller distances between aircraft and thus increase traffic capacity 
without building new airports or runways. Pairs of aircraft were characterized based on 
their direction of travel and separation as shown in Table 1. During the task, operators 
rated their instantaneous perceived workload on a five-point scale roughly 
corresponding to D, Al, A2, A3/B, and B/C.) Operators were prompted for their self- 
assessment once every two minutes. The experiment was able to generate self- 
assessment values between 1 and 4; no overload conditions were generated in this 
study. Using a model based on task variance and complexity of current aircraft under 
control, the researchers were able to predict perceived workload 74% of the time, but 
more importantly, were able to predict a self-rating of 4 for 80% of the cases. 
Predicting the boredom value of 1 was less successful, at about 60%. Besides the 
numerical predictions, the authors conclude that complexity of the task drives perceived 
workload.?6 


Table 1. Variables used in Determining Complexity of the Traffic relationship between pairs of aircraft. Using 
four variables with three thresholds gives different classes of complexity. For example, an aircraft pair could be 5 
miles apart (3 to 7 threshold), traveling in the same lateral direction, flying with 2500 ft of vertical separation (> 
2000ft), and both straight and level. There are 81 such combinations, 


Threshold 


varisole Low Mid High 
Lateral separation < 3 miles 3 to 7 miles > 7 miles 
Lateral direction Same direction Opposite direction Crossing 
Vertical separation < 800 ft 800 to 2000 ft > 2000 ft 
ated ‘ One straight, one climbing Both climbing or 
Vertical direction Both straight and level or descending descending 


Physiological indicators of stress were measured at two low-traffic control centers 
(Fayetteville, AR and Roswell, NM) and one higher traffic center (Oklahoma City). Heart 
rates were measures along with hormone secretion levels in urine. The urine specimens 
were pooled before analysis into two groups throughout a 5-day workweek: during the 
8-hour workday, night time after work. Additionally, controllers completed the State- 
Trait Anxiety Index before and after each workday. Results showed that lower traffic 
centers exhibited lower stress levels, and that the best biological indicator of stress was 
epinephrine levels from urine rather than HR. The authors concluded that traffic load 
and complexity were the main sources of stress in the ATC task rather than the nature 
of the job itself.?7 


A straightforward experiment to look at physiological responses using EEG and EOG to 
lapses in attention during ATC tasks was performed by Peiris in 2005. The goal of the 
study was to categorize expert analysis of EEG/EOG data to develop an automated 
analysis program that could read the data online without human intervention to alert 
operators to attention lapses and low levels of alertness. Professional ATC operators 
were given 10 minute intervals of the psychomotor vigilance task (PVT). Recorded data 
was evaluated by several human expert EEG and EQG analyzers. These experts were 
not able to correctly identify alertness or attention lapses (EEG identified only 6 of 101 


In this study, the A3/B rating described “Non-essential tasks suffering, could not work at this level for long,” while 
B/C described getting behind and losing situational awareness. 
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lapses). Peiris concluded that a built-from-scratch automated system is needed to 
identify subtle features, especially in the low-count electrode EEG (5 electrodes).28 

In the definition of complexity, it is important to note that one should not focus entirely 
on a single aspect of the ATC task in the laboratory. Donald defines complexity in two 
aspects: task complexity independent of the event rate, and ancillary aspects of the job 
in a naturalistic environment. The complexity of the task as a whole needs to be 
considered instead of focusing on a single aspect such as monitor and detect and the 
rate at which this task can be completed.29 


The naturalistic environment was in fact utilized by Brookings in 1996.29 Moreover, in 
situ measurements were conducted by Collet in 2009.3! Both studies examined TLX 
ratings, as well as several ANS variables. Brookings additionally utilized EEG. 


In the Brookings study, three simulated TRACON sessions were conducted. The first 
session varied traffic volume between low, medium and high levels, while the second 
session varied task complexity at a constant rate of aircraft. The third scenario was 
conducted with an overwhelming number of aircraft, the goal being to take 
physiological data in the condition where situational awareness is lost.‘ 


The traffic load variance session lasted 45 minutes with three 15-minute sessions where 
the controller was required to handle 6, 12, and 18 aircraft; the order of presentation 
was counterbalanced across subjects. Other complexity factors, such as the ratio of 
overflights, arrivals, and departures, were kept constant. 


In the complexity variation session, the number of aircraft was kept constant at 12, 
while various complicating factors were modulated. Changing complexity factors 
included: 


e Altering the ratio of arriving to departing and flyover traffic. 


e Changing the probability that a pilot didn’t hear or failed to execute a controller’s 
instruction. 


e Increasing or decreasing the heterogeneity of aircraft type. 


In the overload session, 15 aircraft were presented in 5 minutes. 


Physiological variables monitored included heart activity using two electrodes on the 
chest, EOG using electrodes around the eyes,' respiration using elastic transducer bands, 
and 19 channels of EEG using a cap outfitted with a standard 10-20 configuration.” 
Task performance points were awarded for successfully handing aircraft, minus any 
points for operational errors such as separation conflicts, hand-off errors, and missed 
approaches. TLX ratings were recorded between workload conditions during a designed 
1-minute lull in traffic. The simulation was considered quite difficult, even for 
professional Air Force ATC, and participants were required to practice until they didn’t 


* In colloquial terms, ATC call this “losing the picture.” 

' Electrodes are pointed out here as more recent methodology could use infra-red optical devices to record heart 
and ocular activity, 

™ The standard 10-20 configuration refers to electrodes every 10%/20% of the total distance between right- 
leftf/anterior-posterior anatomical markers. A 10-10 configuration would include twice the electrodes, etc. 
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crash any planes in any of the scenarios - this required approximately 6 hours of 
practice per participant before the experiment was conducted. 


Only one of the eight controllers in the Brookings study rated the overload condition as 
a loss of situational awareness. The results compared the TLX, primary task 
performance, and physiological measures to low, medium, and high workload conditions, 
as well as the max or overload condition. Primary task performance is represented in 
Figure 4 (this chart was recreated visually from the source chart to accurately represent 
all trends), showing a trending effect for complexity but not volume. Additional results 
showed that changes in task difficulty (volume or complexity) produced changes in TLX, 
eye blink rate, respiration rate, and the EEG power spectra. The EEG power spectra 
were different for changes in volume versus changes in complexity. There were no 
observed significant correlations with heart rate or heart rate variability. The authors 
conclude that psychophysiological data can be used to accurately measure workload in 
real-time, an observation they note confirms earlier work on F4 crew members 
performing flight tasks of modulated complexity. 


The Brookings data was reanalyzed by Wilson in 2003 using an artificial neural network 
approach as well as a stepwise discriminate analysis to classify a physiological state as 
either operational or overloaded. Wilson successfully classified the overload condition 
consistently in more than 98% of the cases. The authors admit an issue with 
psychophysiological variation (day-to-day) that would need to be normalized and 
further research is required.*? 
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Figure 4. Representation of Performance Results from Brookings Study. 30 Shown are the three scenarios 
with modulation of traffic volume, complexity, and the overload condition. Note that the Low workload entry for 
the volume modulation performance (6 planes) was already 80% and this was similar to the 12-plane medium 
complexity performance. Only the low complexity, 12-plane scenario showed near 100% primary task 
performance. 
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The more recent Collet study recorded 5 ANS variables from 25 participants during real 
ATC operations. The population, mean age of 44, included only fully qualified operators, 
who were monitored for one hour during TRACON duty at Saint Exupéry International 
Airport (Lyon, France). Correlation analyses were performed with the number of aircraft 
the operator was currently controlling. No adjustment was made for task complexity; 
however, data were acquired between 6 and 9 PM local time to collect medium and high 
workload data. Each participant handled between 1 and 10 aircraft during the study. 
The results of the correlation analysis are shown in Table 2. The authors conclude that 
changing the number of aircraft for professional ATCs produced correlations in 
physiological measures for SC, SBF, and IHR.#! 


Table 2. Correlations among Physiological Variables in a study of air traffic controller workload 
modulation with variable number of aircraft. NA: number of aircraft; TLX: NASA self-report workload 
metric; Std SC: normalized skin conductance; Std SP: normalized skin potential; Std SBF: normalized 
capillary blood flow measured through the skin; Std ST: normalized skin temperature; Std LHR: 
normalized instantaneous heart rate. Bold values show significant correlation. SC, SBF, and IHR show 
significant correlation with changes in NA. Normalizations (standardizations) were performed against 
baseline data per subject to decrease inter-subject noise.71 


NA TLX Std SC Std SP 


Std SBF Std ST 


NA 1 

TLX 98 1 
p<.001 

Std SC 93 .89 1 
p=.002 p=.008 

Std SP tt 67 91 1 
NS NS p=.005 

Std SBF -.97 -.94 ~ 87 are dr 1 
p<.0001 p=.001 P=,02 NS 

Std ST “79 -.80 -.62 -.43 .82 1 
NS NS NS NS NS 

Std THR .98 95 97 85 -.93 -.88 
p<.0001 p<.0001 p<.0001 p=.03 p=.002 p=.005 


Adaptive automation (AA) is the rebalancing of workload between the computer and 
human. Low workload levels can be supplemented with usually routine tasks that will 
keep the operator attentive, while providing the subject with additional mission 
information. This “extra information” may not be critical, but it will keep the subject 
from disengaging from the overall task. The goal of AA is to maintain peak performance 
of the system, in the Al to A3 regions. Kaber studied AA in terms of a simulated ATC 
task in 2005. Forty non-professional participants were monitored for primary and a 
probe secondary task performances. Results showed that primary task performance 
was greatest when AA was added to the system.*4 
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MODELING THE AIR TRAFFIC CONTROL TASK 


Like any profession, ATC personnel experience day-to-day variation in performance, 

and there are natural variations between controllers. In order to study these differences, 
mentioned in most of the studies detailed above, a model needs to be built of the 
controller, the environment, and the task, with the goal of locating where the majority 
of changes may be occurring, and where any augmentation may be best suited to assist 
in performance. 


Specific to the air traffic controllers, the major variation source found when studying 
large variations in performance was disruption of the circadian rhythm leading toa 
disequilibrium condition described as a biological instability. Fortunately, no fancy 
technology system was required to solve this particular problem, just proper human 
resource management to avoid frequent shift switching. #5 


Loft proposed that modeling the ATC task complexity and workload is insufficient to 
predict performance due to the overriding effect of operator decision strategy. ATC 
operators can select priorities, manage their own cognitive resources, and thus regulate 
their own performance. The primary relief for the ATC operator is handing off traffic to 
another local operator.*° Our overall topic is concerned with a single pilot in a space 
environment, where no room full of colleagues exists to take up the slack; therefore, 
such group modeling techniques are outside the scope of the current treatise. 


We do note that Loft develops excellent single-task descriptions of time pressure, 
conflict detection, conflict resolution, etc. 


As shown multiple times in the preceding section, single-task processing time and 
intensity (difficulty or complexity) are the primary drivers of workload. Developing a 
model connecting time, intensity, and effort, Hendy shows how decision time connects 
a time-intensity-effort loop (Figure 5), Hendy contends decision time is the single 
variable dominant in workload. Within this loop model, increasing the event rate is akin 
to increasing task difficulty. The adaptation strategies are developed with training and 
experience, a possible explanation of the difference in junior and senior performance 
with augmentation aids. 


Averty contends that ATC workload cannot be directly measured, but must be inferred 
from a quantifiable mixture of including objective and subjective measures. He breaks 
down the controller task into monitoring, vectoring, and conflict solving, and develops a 
refinement of the NASA-TLX called TLI. Averty’s Traffic Load Index is based on number 
of aircraft, but each aircraft is given additional weight according to processing 
requirements on the controller, including both cagnitive and emotional weight: for 
example, aircraft with path conflicts to resolve are given the highest weight, while 
isolated flyover traffic is given low weight. The authors conclude that TLI needs to 
include physiological inputs as well to fully model the task-controller interaction. 
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Figure 5. Information Processing Medel for a Human Operator. 3? 


In madeling the controllers themselves, one can look to selection efforts and find which 
candidate skills are the best predictors of training success. Pre-strike data (see footnote 
h on page 9) of ATC training showed that strong candidates had skills in spatial 
relations, abstract reasoning, and math as well as oral decision-making.*® 


In designing a model, it is tempting to engage subject matter experts (SME) to 
estimate the demand of various resources within a multiple resource model of ATC. 
Cohen studied SME predictions of workload within a model of 7 channels (visual 
perception, auditory perception, spatial information processing, analytical information 
processing, verbal information processing, manual activity and speech). The authors 
concluded that using the 7-channe! multiple resource model with the SME approach 
doesn’t work for predicting workload in the ATC task.%9 


Hancock has written extensively on the multiple-resource model of cognitive task 
performance under stress. He points out that functional brain imaging studies clearly 
show resources are separated anatomically, providing evidence that the multiple 
resource model should not be abandoned in future research.*° 


One model that was proposed was that local visual distractions interfered with cockpit 
ATC tasks. The model has attractiveness given the incredible complexity of cockpit 
displays and the popular notion that in-vehicle distraction is the root of all evil for 
automobile driving. Iona proposed that a tunnel display for in-cockpit ATC information 
would reduce the effect of outside visual distraction and increase primary task 
performance measures. The authors concluded that this type of augmentation has little 
effect on trained professional pilots in performance of their duties.*+ 
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In an essential foundation study for introducing forms of augmentation to the ATC task, 
Wickens modeled the dual task environment of pilot traffic avoidance using alarms to 
augment detection of conflicts. The primary task was maintaining aircraft flightpath 
using a simulated cockpit display of a crosshair inside a box: the crosshair indicated 
aircraft direction and drifted toward the sides of the box if not corrected by the pilot 
using a joystick control. The drift rate was variable and could increase or decrease the 
difficulty of the primary task. The computer monitored for potential collisions and 
warned pilots if another aircraft was within 3 miles. 


Pilots were instructed to maintain their own aircraft flightpath first, and then detect 
conflicts. Upon alarm, the pilot was to examine an ATC display and recommend re- 
routing of the conflicting aircraft. Pilots were informed that the automated detection 
system may erroneously label some situations as conflicts; therefore the pilot needed to 
actually perform several ATC cognitive task functions to confirm the conflict before 
recommending action. Participants in the study were 12 student pilots. 


The results of this experiment showed that when augmentation was correct more than 
about 80% of the time, performance decreased due to decreased vigilance in 
confirming alarms properly on the ATC display. Additionally, with a high accuracy in the 
alarm rate, pilots did not regularly check the ATC display to ensure that the 
augmentation didn’t miss possible conflicts. Auditory and visual binary alarms were 
presented and the auditory alarms were more effective and did not interfere with the 
primary task (visual tracking). The authors concluded that a 20-25% false alarm rate 
was optimal” when it is intended for the pilot to work alongside the automation rather 
than rely on it.42 


The Wickens study was undertaken with the plan of moving ATC to a shared 
responsibility of the controller and pilot: the goal being to increase airspace capacity by 
removing some of the more mundane functions like en-route course correction and en- 
route conflict detection to primarily cockpit control. This situation would be analogous to 
a spacecraft pilot operating their primary vehicle manually while attending many semi- 
automated ancillary vehicles. Although not exactly the same, 


Landsdown studied TLX workload measures on drivers performing multiple in-vehicle 
tasks. The authors here concluded that secondary tasks significantly increased 
perceived workload in this arrangement of task control.42 


" This noise in the signal is analogous to adjusting the squelch level on a CB radio: too high a setting and you miss 
traffic; too low a setting and all you hear is random noise. 
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Chapter 4: Studies in Command of Multiple Semi- 
Automated Vehicles 


Another analogue to remotely commanding several spacecraft is piloting or supervising 
operation of several unmanned ground or air vehicles (UGVs and UAVs). The primary 
use of these scenarios is in military operations which imposed additional criteria on the 
control system. Examples of several UAVs and their primary missions are shown in 
Figure 6. Control of vehicles can be either in-theater, at a range of yards to 10's of 
miles, or from a long-range command and contro] center, such as the Predator 
reconnaissance in the Iraq or Afghan Theater executed from bases within the 
continental United States. 


In addition to single vehicle control systems, UAV swarms are being developed. In this 
scenario a remote pilot executes a command to the swarm which communicates 
amongst itself to establish, for example, an RF emitter target location.“ In this type of 
a control system the raw number of vehicles under one pilot’s control can dramatically 
increase, but the number of swarms then takes the place of the number of vehicles in 
developing big picture cognitive limits. Such systems are also under development for 
space exploration.?° 


In 2005, the US Army was operating two tactical surveillance UAVs: the Hunter and its 
newer replacement, the Shadow, Each UAV requires a team of two operators. Dixon 
studied the workload of Hunter/Shadow operators and with the help of SMEs designed a 
simulation to determine if augmentation systems could increase the number of aircraft 
controlled per pilot from one-half to two. Pilots were responsible for mission completion 
(reconnaissance of a command target area), locating targets of opportunity (TOO), and 
on-board system monitoring. There were three levels of pilot aircraft control: baseline, 
autoalert, and autopilot. In the first two conditions, operators controlled the flight of the 
aircraft using a joystick to indicate direction; altitude and airspeed were help constant, 
while a computer controlled the remaining flight parameters (pitch, bank, etc.). 
Occasionally pilots needed to compensate for wind changes. In the autopilot condition, 
operators entered the final coordinates of the next command target and the aircraft 
proceeded in a straight line, compensating automatically for wind changes. 


Pilots flew 10 straight flight legs. At the beginning of each leg, the command target was 
identified and instructions on what to locate were given. If the pilot forgot the 
instructions, they could hit a “repeat” button. At the end of each leg, high-workload 
tasks of loitering and zoom/pan the onboard camera to visualize the entire command 
target were executed. Along each leg between command targets, pilots were instructed 
to search for TOOs. Primary task completion included locating all relevant information 
about the command target. Secondary task completion included TOO identification and 
monitoring for an on-board system failure. The autoalert condition detected system 
failures and produced an audio alert when the command target was reached. 


Results showed that the autoalert augmentation dramatically decreased the time to 
locate system failures, as well as significantly decreasing the number of requested 
instruction repeats. Also, the autopilot augmentation dramatically decreased both 
flightpath deviation and the requested number of repeats, and also dramatically 
increased the number of TOO detections. Results were similar for the single and dual 
aircraft scenarios, though some performance, notably TOO detection (92% to 79%), did 
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decrease between single and dual control cases. The authors attribute this to the UAV 
interface complexity of 4 screens per vehicle. The authors conclude that further study is 
required in augmentation strategies and system development.** 


Figure 6. Examples of Unmanned Military vehicles: (left to right, top to bottom) Predator, Global Hawk, Fire 
Scout, Beli Eagle, BAE Mantis, rendering of an airwing of mixed UAVs, Tomahawk cruise missile, remote ground 
vehicles, and the Raven. 
19 
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Lee has previously shown that multiple automated ground vehicles can be autopiloted 
successfully in a two-stage process when nominal information is available beforehand 
about the environment to be searched. The two-stage process includes an offline, 
permission routing table generation where the primary path of each vehicle in planned 
and downloaded, and an online traffic control stage where small changes to the plan 
are executed to avoid collisions and deal with contingent activity.*’ 


Attempting to increase the limits of UAV to pilot ratio, Ruff examined simulations using 
three augmented control techniques: manual, management by consent, and 
management by exception. The authors concluded that the middle level of automation 
performed best when considering that augmentation algorithms may have associated 
errors. They also concluded that the absolute maximum number of UAVs a person could 
control is four. 


Cummings addresses the UAV interface issue in a study on retargeting multiple in-flight 
cruise missiles. Cruise missiles were chosen because they require minimal active 
piloting. A dual screen interface was constructed very similar to the ATC setup on one 
map and one list of objects being tracked (in the case of the ATC system, the list is 
physical rather than a second computer display — see Figure 3b). Cummings side-steps 
the issue of a cognitive workload metric based on complexity and number of tracked 
objects by assuming that an operator can execute changes to only one vehicle at a time, 
and then counting the ratio of time busy making changes to total time in a scenario. 
This measure is called utilization and previous work in systems engineering*® and 
queuing theory has shown that utilization rates around 70% max out the typical human 
operator's ability to hold a big picture. We note that this scenario involves minimal 
interaction between the missiles, such as flight path conflicts. The missile study is 
compared to free flight ATC task, where en route and conflict resolution is the 
responsibility of pilots rather than ATC operators. 


Cummings develops several performance measures that are worthy, but for the current 
treatise an analogue is more succinct: the utilization measure in multiple-object 
tracking and retargeting is similar to a grandmaster playing many games of chess 
simultaneously. They walk fram board to board, think for a bit, make a move, and then 
move to the next board. Many high-level chess players can look at board position and 
evaluate what the next move is without knowledge of previous moves in the game. The 
question at hand is, how many simultaneous games can the grandmaster play before 
he is forced to revert to cold position analysis with every new presentation of a game? 
The conclusion is 16, and it agrees with previous work on free flight ATC.49 Cummings 
additionally takes issue with the Ruff limit of four UAVs, noting that this previous 
experiment included a far more demanding piloting task. 


More recent work by Cummings added aircraft heterogeneity to the experiment.°° She 
concluded again that 70% utilization is optimal, but notes that the queuing theory 
concept of wait times will significantly affect the maximum number of vehicles that can 
be attended. In a multiple-vehicle control situation, a vehicle that has exhibited some 
decrement in performance has an interaction time with the operator to bring it back to 
acceptable performance. It will then follow its automated routine for a period, called the 
neglect time, until it falls again below performance threshold and requires the pilot’s 
attention: the time between the need for attention and the beginning of the next 
interaction time is the wait time. Including wait times generated by more complex 
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interfaces reduces the maximum number of aircraft controllable to seven.>! A final 
study looks at an abstract model of continuous re-planning in different time intervals. 
This study observes that subjects reacted differently, but in three groups, to automated 
suggestions for ré-planning. The authors conclude that human-automation consensus 
is the primary driver of system performance.®2 In other words, the human-task 
interaction must be taken into account, as suggested above by Averty, Loft, and others 
outlined above. 
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Chapter 5: Discussion 


We have insight into the maximum number of tracked objects in a multiple space 
vehicle piloting experiment: it depends greatly on the complexity of the piloting and 
mission tasks. Brookings showed that in the ATC task, mild complexity affected 
performance even for a little as six planes being tracked by professional controllers, 
whereas these controllers regularly track up to ten. Ruff showed that when there is 
uncertainty in the augmentation system, a maximum of four craft can be controlled and 
tasked to complete missions. The number of four is consistent with standard estimates 
of human working memory being able to handle three to five disparate objects ata 
time. This implies that disparate, complex interfaces require resources from working 
memory to prevent loss of the big picture. 


Augmentation of the human capabilities mainly appears to be helping to maintain a 
higher number of working memory registers. Whether it is the handwritten blocks for 
the ATCs, the stored instructions for the Dixon study, or the dual displays of Cummings, 
the most effective augmentations in the studies above hold information for quick visual 
retrieval that the brain would otherwise keep in working memory. 


Any external automation system to assist the operator in making decisions will have an 
associated error rate. It was also shown in the ATC and piloting tasks that alerts need 
to contain a level of noise (false alarms) of 20-25% to avoid automation bias. 


Regarding where the future of this work is headed, it is certain that the field is just 
getting started. Apollo spacecraft required dozens of ground operators to monitor for 
system failures, and just a few years ago it required two soldiers to operate a simple 
reconnaissance drone (most of them still do). It is fortunate that ATC and UAV control 
appear to be extremely applicable to the initial direction of remote space vehicle 
operations. The 5-year timeframe should see spacecraft-specific simulator studies begin 
to appear in major peer-reviewed journals. 


The major advance to come in developing augmented human capability to pilot multiple 
spacecraft will be in understanding the cognitive organization of multitasking. With 
brain imaging it has been shown that multiple resource theory seems to fallow the 
anatomical organization of the brain. In the next 40 years we will find out why the 
functional studies in multiple task completion don’t seem to follow the predictions of 
multiple resource theory. 


22 : 
UNCLASSIFIED / / P@@FPtGE=UsE One: 


UNCLASSIFIED/ / 


Chapter 6: Conclusions 


There is a lack of research in the area of cognitive limits on the number of spacecraft 
one pilot could control given any mission scenario. Two models for examining what are 
similar activities are air traffic control and remote piloting of multiple unmanned 
vehicles. We have shown the research progress in both areas, and the cognitive limits 
on the number of craft that can be simultaneously controlled are 16 for simple 
destination selection, 7 for moderately complex piloting and/or mission task completion, 
and 4 for complex heterogeneous craft. Future research may increase the automation 
component of aircraft and mission control, but there is no evidence to date that a 
complete mental picture can be maintained, even with external working memory 
augmentation, for more than about 16 objects at one time. 


We have additionally shawn that physiological variables can be objectively employed to 
indicate overload. Nominal success has been achieved in classifying physiological states 
near high workload and thus able to predict and thus possibly prevent overload. We 
expect this classification will be achieved with near perfect accuracy within five years of 
specific studies being commenced. 
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